NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems

Hildebrand, Mark; Lowe-Power, Jason; Akella, Venkatesh (May 2024, 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
Efficient Large Scale DLRM Implementation On Heterogeneous Memory Systems

https://doi.org/10.1007/978-3-031-32041-5_3

Hildebrand, Mark; Lowe-Power, Jason; Akella, Venkatesh (May 2023, High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings)

We propose a new data structure called CachedEmbeddings for training large scale deep learning recommendation models (DLRM) efficiently on heterogeneous (DRAM + non-volatile) memory platforms. CachedEmbeddings implements an implicit software-managed cache and data movement optimization that is integrated with the Julia programming framework to optimize the implementation of large scale DLRM implementations with multiple sparse embedded tables operations. In particular we show an implementation that is 1.4X to 2X better than the best known Intel CPU based implementations on state-of-the-art DLRM benchmarks on a real heterogeneous memory platform from Intel, and 1.32X to 1.45X improvement over Intel’s 2LM implementation that treats the DRAM as a hardware managed cache.
more » « less
Full Text Available
A Case Against Hardware Managed DRAM Caches for NVRAM Based Systems

https://doi.org/10.1109/ISPASS51385.2021.00036

Hildebrand, Mark; Angeles, Julian T.; Lowe-Power, Jason; Akella, Venkatesh (March 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS))
null (Ed.)
Non-volatile memory (NVRAM) based on phase-change memory (such as Optane DC Persistent Memory Module) is making its way into Intel servers to address the needs of emerging applications that have a huge memory footprint. These systems have both DRAM and NVRAM on the same memory channel with the smaller capacity DRAM serving as a cache to the larger capacity NVRAM in the so called 2LM mode. In this work we analyze the performance of such DRAM caches on real hardware using a broad range of synthetic and real-world benchmarks. We identify three key limitations of DRAM caches in these emerging systems which prevent large-scale, bandwidth bound applications from taking full advantage of NVRAM read and write bandwidth. We show that software based techniques are necessary for orchestrating the data movement between DRAM and PMM for such workloads to take full advantage of these new heterogeneous memory systems.
more » « less
Full Text Available
Diel Transcriptional Oscillations of a Plastid Antiporter Reflect Increased Resilience of Thalassiosira pseudonana in Elevated CO2

https://doi.org/10.3389/fmars.2021.633225

Valenzuela, Jacob J.; Ashworth, Justin; Cusick, Allison; Abbriano, Raffaela M.; Armbrust, E. Virginia; Hildebrand, Mark; Orellana, Mónica V.; Baliga, Nitin S. (May 2021, Frontiers in Marine Science)

Acidification of the ocean due to high atmospheric CO 2 levels may increase the resilience of diatoms causing dramatic shifts in abiotic and biotic cycles with lasting implications on marine ecosystems. Here, we report a potential bioindicator of a shift in the resilience of a coastal and centric model diatom Thalassiosira pseudonana under elevated CO 2 . Specifically, we have discovered, through EGFP-tagging, a plastid membrane localized putative Na + (K + )/H + antiporter that is significantly upregulated at >800 ppm CO 2 , with a potentially important role in maintaining pH homeostasis. Notably, transcript abundance of this antiporter gene was relatively low and constant over the diel cycle under contemporary CO 2 conditions. In future acidified oceanic conditions, dramatic oscillation with >10-fold change between nighttime (high) and daytime (low) transcript abundances of the antiporter was associated with increased resilience of T. pseudonana . By analyzing metatranscriptomic data from the Tara Oceans project, we demonstrate that phylogenetically diverse diatoms express homologs of this antiporter across the globe. We propose that the differential between night- and daytime transcript levels of the antiporter could serve as a bioindicator of a shift in the resilience of diatoms in response to high CO 2 conditions in marine environments.
more » « less
Full Text Available
AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming

https://doi.org/10.1145/3373376.3378465

Hildebrand, Mark; Khan, Jawad; Trika, Sanjeev; Lowe-Power, Jason; Akella, Venkatesh (March 2020, ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems)

Memory capacity is a key bottleneck for training large scale neural networks. Intel® Optane DC PMM (persistent memory modules) which are available as NVDIMMs are a disruptive technology that promises significantly higher read bandwidth than traditional SSDs at a lower cost per bit than traditional DRAM. In this work we show how to take advantage of this new memory technology to minimize the amount of DRAM required without compromising performance significantly. Specifically, we take advantage of the static nature of the underlying computational graphs in deep neural network applications to develop a profile guided optimization based on Integer Linear Programming (ILP) called AutoTM to optimally assign and move live tensors to either DRAM or NVDIMMs. Our approach can replace 50% to 80% of a system's DRAM with PMM while only losing a geometric mean 27.7% performance. This is a significant improvement over first-touch NUMA, which loses 71.9% of performance. The proposed ILP based synchronous scheduling technique also provides 2x performance over using DRAM as a hardware-controlled cache for very large networks.
more » « less
Full Text Available
Bionic 3D printed corals

https://doi.org/10.1038/s41467-020-15486-4

Wangpraseurt, Daniel; You, Shangting; Azam, Farooq; Jacucci, Gianni; Gaidarenko, Olga; Hildebrand, Mark; Kühl, Michael; Smith, Alison G.; Davey, Matthew P.; Smith, Alyssa; et al (December 2020, Nature Communications)

Full Text Available

Search for: All records